Goto

Collaborating Authors

 continuous time


The Origin of Edge of Stability

Litman, Elon

arXiv.org Machine Learning

Full-batch gradient descent on neural networks drives the largest Hessian eigenvalue to the threshold $2/η$, where $η$ is the learning rate. This phenomenon, the Edge of Stability, has resisted a unified explanation: existing accounts establish self-regulation near the edge but do not explain why the trajectory is forced toward $2/η$ from arbitrary initialization. We introduce the edge coupling, a functional on consecutive iterate pairs whose coefficient is uniquely fixed by the gradient-descent update. Differencing its criticality condition yields a step recurrence with stability boundary $2/η$, and a second-order expansion yields a loss-change formula whose telescoping sum forces curvature toward $2/η$. The two formulas involve different Hessian averages, but the mean value theorem localizes each to the true Hessian at an interior point of the step segment, yielding exact forcing of the Hessian eigenvalue with no gap. Setting both gradients of the edge coupling to zero classifies fixed points and period-two orbits; near a fixed point, the problem reduces to a function of the half-amplitude alone, which determines which directions support period-two orbits and on which side of the critical learning rate they appear.



Group-Fair Online Allocation in Continuous Time

Neural Information Processing Systems

The theory of discrete-time online learning has been successfully applied in many problems that involve sequential decision-making under uncertainty. However, in many applications including contractual hiring in online freelancing platforms and server allocation in cloud computing systems, the outcome of each action is observed only after a random and action-dependent time. Furthermore, as a consequence of certain ethical and economic concerns, the controller may impose deadlines on the completion of each task, and require fairness across different groups in the allocation of total time budget $B$. In order to address these applications, we consider continuous-time online learning problem with fairness considerations, and present a novel framework based on continuous-time utility maximization. We show that this formulation recovers reward-maximizing, max-min fair and proportionally fair allocation rules across different groups as special cases. We characterize the optimal offline policy, which allocates the total time between different actions in an optimally fair way (as defined by the utility function), and impose deadlines to maximize time-efficiency. In the absence of any statistical knowledge, we propose a novel online learning algorithm based on dual ascent optimization for time averages, and prove that it achieves $\tilde{O}(B^{-1/2})$ regret bound.


POMDPs in Continuous Time and Discrete Spaces

Neural Information Processing Systems

Many processes, such as discrete event systems in engineering or population dynamics in biology, evolve in discrete space and continuous time. We consider the problem of optimal decision making in such discrete state and action space systems under partial observability. This places our work at the intersection of optimal filtering and optimal control. At the current state of research, a mathematical description for simultaneous decision making and filtering in continuous time with finite state and action spaces is still missing. In this paper, we give a mathematical description of a continuous-time partial observable Markov decision process (POMDP). By leveraging optimal filtering theory we derive a Hamilton-Jacobi-Bellman (HJB) type equation that characterizes the optimal solution. Using techniques from deep learning we approximately solve the resulting partial integro-differential equation. We present (i) an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning. We show the applicability on a set of toy examples which pave the way for future methods providing solutions for high dimensional problems.



Nash Equilibrium and Belief Evolution in Differential Games

Zhou, Jiangjing, Petrosian, Ovanes, Zhang, Ye, Gao, Hongwei

arXiv.org Artificial Intelligence

Differential games [4, 6] involve multiple players controlling a dynamical system through their actions, which are described by differential state equations. These games evolve over a continuous-time horizon, where each player seeks to optimize an objective function that depends on the system's state, their own actions, and potentially the actions of others. In this study, we extend the classic differential game model to scenarios involving motion-payoff uncertainty, where players face uncertainties in both the dynamic equations and the payoff functions, and are unaware of certain parameters in the environment or in their opponents' payoff structures. In dynamic games, optimal control techniques are generalized to accommodate multiple players with both shared and conflicting interests. As shown in [9], if a set of interconnected partial differential equations--commonly referred to as the Hamilton-Jacobi-Bellman (HJB) equations--has solutions, then a Nash equilibrium can be achieved. At this equilibrium, no player can improve their outcome by unilaterally changing their strategy. However, traditional dynamic game models often assume that all players possess complete knowledge of the game. In many real-world scenarios, players face rapidly changing and uncertain environments, leading to incomplete information about the system's dynamics and payoffs [22, 3, 15, 1]. To address this uncertainty, we apply Bayesian updating methods, where players update their beliefs about unknown parameters as new information becomes available.




ZORMS-LfD: Learning from Demonstrations with Zeroth-Order Random Matrix Search

Dry, Olivia, Molloy, Timothy L., Jin, Wanxin, Shames, Iman

arXiv.org Artificial Intelligence

We propose Zeroth-Order Random Matrix Search for Learning from Demonstrations (ZORMS-LfD). ZORMS-LfD enables the costs, constraints, and dynamics of constrained optimal control problems, in both continuous and discrete time, to be learned from expert demonstrations without requiring smoothness of the learning-loss landscape. In contrast, existing state-of-the-art first-order methods require the existence and computation of gradients of the costs, constraints, dynamics, and learning loss with respect to states, controls and/or parameters. Most existing methods are also tailored to discrete time, with constrained problems in continuous time receiving only cursory attention. We demonstrate that ZORMS-LfD matches or surpasses the performance of state-of-the-art methods in terms of both learning loss and compute time across a variety of benchmark problems. On unconstrained continuous-time benchmark problems, ZORMS-LfD achieves similar loss performance to state-of-the-art first-order methods with an over $80$\% reduction in compute time. On constrained continuous-time benchmark problems where there is no specialized state-of-the-art method, ZORMS-LfD is shown to outperform the commonly used gradient-free Nelder-Mead optimization method.


Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do

Wald, Yoav, Goldstein, Mark, Efroni, Yonathan, van Amsterdam, Wouter A. C., Ranganath, Rajesh

arXiv.org Artificial Intelligence

Problems in fields such as healthcare, robotics, and finance requires reasoning about the value both of what decision or action to take and when to take it. The prevailing hope is that artificial intelligence will support such decisions by estimating the causal effect of policies such as how to treat patients or how to allocate resources over time. However, existing methods for estimating the effect of a policy struggle with \emph{irregular time}. They either discretize time, or disregard the effect of timing policies. We present a new deep-Q algorithm that estimates the effect of both when and what to do called Earliest Disagreement Q-Evaluation (EDQ). EDQ makes use of recursion for the Q-function that is compatible with flexible sequence models, such as transformers. EDQ provides accurate estimates under standard assumptions. We validate the approach through experiments on survival time and tumor growth tasks.